Indexing and Searching a Mass Spectrometry Database

نویسندگان

  • Søren Besenbacher
  • Benno Schwikowski
  • Jens Stoye
چکیده

Database preprocessing in order to create an index often permits considerable speedup in search compared to the iterated query of an unprocessed database. In this paper we apply index-based database lookup to a range search problem that arises in mass spectrometry-based proteomics: given a large collection of sparse integer sets and a sparse query set, find all the sets from the collection that have at least k integers in common with the query set. This problem arises when searching for a mass spectrum in a database of theoretical mass spectra using the shared peaks count as similarity measure. The algorithms can easily be modified to use the more advanced shared peaks intensity measure instead of the shared peaks count. We introduce three different algorithms solving these problems. We conclude by presenting some experiments using the algorithms on realistic data showing the advantages and disadvantages of the algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing.

Database searching is the technique of choice for shotgun proteomics, and to date much research effort has been spent on improving its effectiveness. However, database searching faces a serious challenge of efficiency, considering the large numbers of mass spectra and the ever fast increase in peptide databases resulting from genome translations, enzymatic digestions, and post-translational mod...

متن کامل

وضعیت بازیابی اطلاعات در دو پایگاه نمایه و نما و سنجش اثربخشی استفاده از واژگان کنترل ‌شده در نمایه‌سازی این دو پایگاه

Purpose: This study was carried out to determine the level of precision, recall, and searching time for “Nama” and “Namayeh” databases, as well as to find out which of the indexing tools (thesaurus and Dewey decimal classification) helps us more in improvement of information retrieval. Methodology: This study is an analytical survey in which the necessary data was collected by direct observati...

متن کامل

Mining Mass Spectra: Metric Embeddings and Fast Near Neighbor Search

Mining large-scale high-throughput tandem mass spectrometry data sets is a very important problem in mass spectrometry based protein identification. One of the fundamental problems in large scale mining of spectra is to design appropriate metrics and algorithms to avoid all-pair-wise comparisons of spectra. In this paper, we present a general framework based on vector spaces to avoid pair-wise ...

متن کامل

pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry

SUMMARY Research in proteomics requires powerful database-searching software to automatically identify protein sequences in a complex protein mixture via tandem mass spectrometry. In this paper, we describe a novel database-searching software system called pFind (peptide/protein Finder), which employs an effective peptide-scoring algorithm that we reported earlier. The pFind server is implement...

متن کامل

A face in the crowd: recognizing peptides through database search.

Peptide identification via tandem mass spectrometry sequence database searching is a key method in the array of tools available to the proteomics researcher. The ability to rapidly and sensitively acquire tandem mass spectrometry data and perform peptide and protein identifications has become a commonly used proteomics analysis technique because of advances in both instrumentation and software....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010